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ABSTRACT 

Intelligent tutoring systems and computer aided learning en- 
vironments aimed at developing problem solving produce 
large amounts of transactional data which make it a chal- 
lenge for both researchers and educators to understand how 
students work within the environment. Researchers have 
modeled student-tutor interactions using complex networks 
in order to automatically derive next step hints. However, 
there are no clear thresholds for the amount of student data 
required before the hints can be produced. We introduce a 
novel method of estimating the size of the unobserved in- 
teraction network from a sample by leveraging Good- Turing 
frequency estimation. We use this estimation to predict size, 
growth, and overlap of interaction networks using a small 
sample of student data. Our estimate is accurate in as few 
as 10-30 students and is a good predictor for the growth 
of the observed state space for the full network, as well as 
the subset of the network which is usable for automatic hint 
generation. These methods provide researchers with metrics 
to evaluate different state representations, student popula- 
tions, and general applicability of interaction networks on 
new datasets. 

1. INTRODUCTION 

Data-driven methods to provide automatic hints have the 
potential to substantially reduce the cost associated with 
developing tutors with personalized feedback. Modeling the 
student-tutor interactions as a complex network provides a 
platform for researchers to automatically generate next step 
hints. An Interaction Network is a complex network repre- 
sentation of all observed student and tutor interactions for a 
given problem in a game or tutoring system. In addition to 
their usefulness for automatically generating hints, interac- 
tion networks can provide an overview of student problem- 
solving approaches for a given problem. 

Data-driven approaches cannot reliably produce feedback 
until sufficient data has been collected, a problem often re- 
ferred to as the Cold Start problem. The precise amount of 


data needed varies by problem and environment. However, 
some properties of Interaction Networks allow us to esti- 
mate how much data is needed. Eagle et al. explored the 
structure of these student interaction networks and argued 
that networks could be interpreted as an empirical sample 
of student problem solving [5]. Students employing similar 
problem-solving approaches will explore overlapping areas 
of the Interaction Network. The more similar a group of 
students is, the smaller the overall explored area of the in- 
teraction network will ultimately be. Since we expect dif- 
ferent populations of students to have different interaction 
networks, and different domains to require varying amounts 
of student data before feedback can be given, good metrics 
for the current and predicted quality of Interaction Networks 
are important. 

In this work, we adapt Good- Turing frequency estimation 
to interaction level data to predict the size, growth, and 
“hintability” of interaction networks. Good- Turing frequency 
estimation estimates the probability of encountering an ob- 
ject of a hitherto unseen type, given the current number 
and frequency of observed objects [8]. It was originally de- 
veloped by Alan Turing and his assistant I. J. Good for use 
in cryptography efforts during World War II. In our con- 
text, network states (vertices) are the object types, and the 
student interactions (edges) leading to those states are ob- 
servations. 

We present several metrics, derived from Good-Turing fre- 
quency estimation. Our hypotheses are that these metrics: 
HI: Predict the probability that a student interaction will 
result in a state which was not previously observed H2: De- 
scribe the proportion of the network that has been observed 
for a population H3: Predict the expected size and growth 
of an interaction network when additional student data is 
added H4: Provide a quantitative comparison of different 
state representations for their ability to represent greater 
proportions of the network H5: Are useful for comparing 
different populations of users in how they explore the prob- 
lem space 

Additionally, we use the metrics to explore the subset of the 
interaction network that is useful for providing automati- 
cally generated hints. This provides us with estimates of the 
size, growth, and coverage of automatically generated hints. 
We find that our metrics quickly become accurate after col- 
lecting a sample of about 10 students. This has value as a 
metric to compare the quality of the interaction networks, 
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and will aid future researchers in determining an adequate 
state representation. We also show how two experimental 
groups, despite having the same amount of network cover- 
age, have substantially different numbers of unique states. 
This supports previous work, suggesting that different pop- 
ulations of students produce different interaction networks 
[5] , which has broad implications for generating hints as well 
as using the networks to evaluate student behavior. 

1.1 Previous Work 

Creation of adaptive educational programs is costly. This is, 
in part, because developing content for intelligent tutors re- 
quires multiple areas of expertise. Content experts and ped- 
agogical experts must work with tutor developers to identify 
the skills students are applying and the associated feedback 
to deliver [13]. In order to address the difficulty in author- 
ing intelligent tutoring content, Barnes and Stamper built 
an approach called the Hint Factory to use student data to 
build a Markov Decision Process (MDP) of student problem- 
solving approaches to serve as a domain model for automatic 
hint generation [18] . Hint Factory has been applied in tutor- 
ing systems and educational games across several domains 
[7, 14, 6], and been shown to increase student retention in 
tutors [19]. 

Early work with the Hint Factory method used a Markov De- 
cision Process constructed from students’ problem-solving 
attempts. Eagle and Barnes further developed this struc- 
ture into a complex network representation of student in- 
teractions with the system, called an Interaction Network 
[5]. Complex networks are graphs or networks which con- 
tain non-trivial topological features unlikely to appear in 
simple or random networks. The Interaction Network rep- 
resentation can be used as a visualization of student work 
within tutors. The effectiveness of Interaction Networks as 
visualizations was shown by Johnson et al. who created 
a visualization tool InVis to aid instructors in analyzing 
student-tutor data [11]. 

Other approaches to automated generation of feedback have 
attempted to condense similar solutions in order to address 
sparse data sets. One such approach converts solutions into 
a canonical form by strictly ordering the dependencies of 
statements in a program [15]. Another approach compares 
linkage graphs modelling how a program creates and mod- 
ifies variables, with nested states created when a loop or 
branch appears in the code [10]. In the Andes physics tutor, 
students may ask for hints about how to proceed. Sim- 
ilarly to Hint Factory-based approaches, a solution graph 
representing possible correct solutions to the problem was 
used. However their solution space was explored procedu- 
rally rather than being derived from student data, and they 
used plan recognition to decide which of the problem deriva- 
tions the student is working towards [20]. 

Interaction networks are scale-free networks. This is a prop- 
erty of complex networks whose degree distribution is heavy- 
tailed, often a power law distribution. In practice, this 
means that a few vertices have degree that is much larger 
than the average, while many vertices have degree some- 
what lower than average [5]. Eagle et al. argued that stu- 
dents with similar problem solving ability and preferences 
would travel into similar parts of the network, resulting in 


some states being more important to the problem than oth- 
ers [5[. Using these “hub” states, sub-regions of the network 
corresponding to high-level approaches to the problem were 
derived. These sub-regions captured problem-solving differ- 
ences between two experimental groups [4]. 

2. METHODS AND MATERIALS 

For the purposes of this work, we are using datasets from 
three different environments to build our interaction net- 
works. Summaries of these datasets are found in Table 1. 
The first dataset is from the Deep Thought tutor, used in 
previous work by Stamper et al. [19]. This dataset was col- 
lected for a between groups experiment investigating the use 
of data-driven hints, so we split the dataset into two groups, 
DT1-C, the control group from that experiment, and DT1- 
H, the group that received hints. We selected this dataset 
to explore and evaluate H5. 

The second dataset comes from the game BOTS. Here, we 
have the same students and interactions represented in two 
different ways: First, using codestates (the programs users 
wrote) and second using worldstates (the output of those 
programs). The advantages and disadvantages of these state 
representations were explored in previous work by Peddy- 
cord and Hicks [14]. We split this dataset into two groups 
as well (BOTS-C and BOTS-W) one for each state repre- 
sentation used. We selected this dataset for evaluation of 
H4. 

Our third and largest dataset comes from an updated ver- 
sion of the Deep Thought tutor, called Deep Thought 3. 
Unlike with the other datasets, Deep Thought 3 features an 
Al problem selection component [12]. This means that not 
all students will have had access to all problems. In addi- 
tion, there is a larger number of problems in this dataset. 
We selected this dataset, as the larger number of problems 
effectively splits student data across multiple networks. HI 
H3 are relevant towards measuring the quality of networks 
produced for new problems. 

Table 1: Dataset summary: the total number of stu- 
dents in the dataset, the number of distinct prob- 
lems, and the average number of students repre- 
sented in each network. 


Dataset 

Total N 

Num Problems 

Mean Net N 

DT1-H 

203 

11 

83.73 

DT1-C 

203 

11 

63.82 

DT3 

341 

59 

78.41 

BOTS-C 

125 

12 

99.75 

BOTS-W 

125 

12 

99.75 


2.1 Constructing an Interaction Network 

An Interaction Network is a complex network representation 
of all observed student and tutor interactions for a given 
problem in a game or tutoring system. To construct an In- 
teraction Network for a problem, we collect the set of all 
solution attempts for that problem. Each solution attempt 
is defined by a unique user identifier, as well as an ordered 
sequence of interactions, where an interaction is defined as 
{initial state, action, resulting state}, from the start of the 


Proceedings of the 8th International Conference on Educational Data Mining 


343 



problem until the user solves the problem or exits the sys- 
tem. The information contained in a state is sufficient to 
precisely recreate the tutor’s interface at each step. Simi- 
larly, an action is any user interaction which changes the 
state, and is defined as {action name, pre-conditions, post- 
conditions}. In Deep Thought, for example, an action would 
be the logical axiom applied, the statements it was applied 
to, and the resulting derived statement. Figure 1 displays 
two Deep Thought interactions. The first interaction works 
forward from STEPO to STEP1 with action SIMP (sim- 
plification) applied to (Z A ->W) to derive -<W. The second 
interaction works backward from STEP1 to STEP2 with ac- 
tion B — ADD (backwards addition) applied to (X V S) to 
derive the new, unjustified statement S. 



Figure 1: Example of state to state transitions 

within the Deep Thought (DTI) propositional logic 
tutoring system. 

Once the data is collected, we use a state matching function 
to combine similar states. In Deep Thought, we combine 
states that consist of all the same logic statements, regard- 
less of the order in which those statements were derived. 
This way, the resulting state for a step STEPO, STEP1, or 
STEP2 in Figure 1 is the set of justified and unjustified state- 
ments in each screenshot, regardless of the order that each 
statement was derived. In BOTS, two state matching func- 
tions were used: one which combined states based on the 
code in students’ programs, and another which instead used 
the output of those programs. Similarly, we use an action 
matching function to combine actions which result in simi- 
lar states, while preserving the frequency of each observed 
interaction. 

2.2 Providing Hints 

Stamper and Barnes’ Hint Factory approach generates a 
next step Hint Policy by modeling student-tutor interactions 
as a Markov Decision Process [18]. This has been adapted 
to work with interaction networks by using a Value Itera- 


tion algorithm on the states [5]. We generate a graph of 
all student interactions, combining identical states using a 
state matching function. Then, we calculate a fitness value 
for each state. We assign a positive value (100) to each goal 
state, that is a state configuration representing a solution to 
the problem. We assign an error cost (-5) for error states. 
We also assign a small cost to performing any action, which 
biases hint-selection towards shorter solutions. We then cal- 
culate fitness values V ( s ) for each state s, where R{s) is the 
initial fitness value for the state, 7 is a discount factor, and 
P(s, s') is the observed frequency with which users in state s 
take an action resulting in state s' . After this, we use value 
iteration [ 2 ] to repeatedly assign each state a value based on 
its neighbors and action costs, weighted by frequency. 

After applying this algorithm, we can provide a hint to guide 
the user toward the goal by selecting the child state with the 
best value. We can do this for any observed state, provided 
that a previous user has successfully solved the problem after 
visiting that state. In the original work with Hint Factory 
on the Deep Thought tutor, the algorithm was permitted to 
backtrack to an earlier state if it failed to find a hint from 
the current state. However, not all environments allow the 
user to backtrack and there are risks of the backtracking 
hints to provide irrelevant information. Because of this in- 
consistency across domains, we did not permit backtracking 
for the purposes of the comparisons in this paper. 

We define a state, S to be Hintable if S lies on a path which 
ends at a goal state. We define the Hintable network to 
be the subset of the interaction network containing only 
Hintable states and edges between hintable states; That is, 
the induced subgraph on the set of Hintable states. 

2.3 Cold Start Problem 

Barnes and Stamper [1] approached the question of how 
much data is needed to get a certain amount of overlap in 
student solution attempts by incrementally adding student 
attempts and measuring the step overlap over a large series 
of trials. This was done with the goal of producing automat- 
ically generated hints, and solution attempts that did not 
reach the goal were excluded. Peddycord et al. [14] used a 
similar technique to evaluate differences in overlap between 
two different interaction network state representations. 

The “Cold Start problem” is an issue that arises in all data- 
driven systems. For early users of the system, predictions 
made are inaccurate or incomplete [17, 16]. If there are in- 
sufficient data to compare to (not enough user ratings, or 
not enough student attempts) then the quality of the rec- 
ommendations suffers and in some cases no recommendation 
can be provided. The term is commonly used in the field 
of collaborative filtering and recommender systems, but it 
can be used to describe three related issues, the “new user,” 
the “new item,” and the “new community” [3] Cold Start 
problems. The “new user” problem refers to the difficulty 
of making recommendations to a user who has performed 
no actions. The “new item” problem refers to the difficulty 
of suggesting users visit a newly added, unobserved state. 
The new community Cold Start problem refers to situations 
where not enough observations exist to make recommenda- 
tions for new users. The “new community” definition corre- 
sponds most closely to the difficulty of generating hints for 
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an entirely new problem in an intelligent tutoring system or 
educational game. 

To measure our ability to address this problem, we add all 
interactions from a single student, one at a time, to the in- 
teraction network. This is in order to simulate the growth 
of the network. We repeat this process for each student, 
measuring the performance of our model each time. We 
measured the proportion of currently observed states to to- 
tal observed states for the entire data set, as well as for the 
subset of states from which a goal is reachable. To control 
for ordering effects, we repeated this trial 1000 times us- 
ing a different random ordering of students each time, and 
aggregated the results. 


2.4 Good-Turing Network Estimation 

We present a new method for estimating the size of the un- 
observed portion of a partially constructed Interaction Net- 
work. Our estimator makes use of Good-Turing frequency 
estimation [8] . Good-Turing frequency estimation estimates 
the probability of encountering an object of a hitherto un- 
seen type, given the current number and frequency of ob- 
served objects. It was originally developed by Alan Turing 
and his assistant I. J. Good for use in cryptography efforts 
during World War II. Gale and Sampson revisited and sim- 
plified the implementation [8]. In its original context, given 
a sample text from a vocabulary, the Good-Turing Estima- 
tor will predict the probability that a new word selected 
from that vocabulary will be one not previously observed. 

The Good-Turing method of estimation uses the frequency 
distribution, the “frequency of frequencies,” from the sample 
text in order to estimate the probability that a new word 
will be of a given frequency. Based on this distribution, 
the probability of observing a new word in an additional 
sample is estimated with the observed proportion of words 
with frequency one. This estimate of unobserved words is 
used to adjust the probabilities of encountering words of 
frequencies greater than one. 

We adapt the Good-Turing Estimator to interaction net- 
works by using the states with an observed frequency of one 
to estimate the proportion of “frequency zero” states. In- 
teraction networks represent the observed interactions and 
therefore we also use this value to estimate the probability 
that a new interaction will transition into a new state. We 
use Po as the expected probability of the next observation 
being an unseen state. Po is estimated by: 



Figure 2: The growth of new states as new students 
are added for each problem, for each dataset. 


Our version of Po is the probability of encountering a new 
state (a state that currently has a frequency of zero,) on a 
new interaction. We also interpret this as the proportion of 
the network missing from the sample. We will refer to an 
interaction with a unobserved state as having fallen off of 
the interaction network. We will use the complement of Po 
as the estimate of network coverage, Ic, the probability that 
a new interaction will remain on the network: Ic = I — Po- 

The state space of the environment is the set of all possi- 
ble state configurations. For both the BOTS game and the 
Deep Thought tutor the potential state space is infinite. For 
example, in the Deep Thought tutor a student can always 
use the addition rule to add new propositions to the state. 
However, as argued in Eagle et. al. [5], the actions that 
reasonable humans perform is only a small subset of the 
theoretical state space; the actions can also be different for 
different populations of humans. We will refer to this sub- 
set as the Reasonable State Space , with unreasonable being 
loosely defined as actions that we would not expect a human 
to take. An interaction network is an empirical sample of 
the problem solving behavior from a particular population, 
and is a subset of the state space of all possible reasonable 
behaviors. Therefore, our metrics Po and Ic are estimates 
of how well the observed interaction network represents the 
reasonable state space. 



(1) 


Where IVi is the total number of frequency 1 states, and N 
is the total number of interaction observations. Since Ni is 
the largest group of states, the observed value of Ni is a rea- 
sonable estimate of Pi. Po can then be used to smooth the 
estimation proportions of the other states. The proportion 
of states with observed frequency r is found by: 


_ (r+l)S(N r+1 ) 

r — , T 


(2) 


where S() is a smoothing function that adjusts the value for 
large values of r [8]. 


3. RESULTS 

In order to evaluate the performance of the unobserved net- 
work estimator, Po, and the network coverage estimator, 
Ic, for each problem in each of our 5 datasets we randomly 
added students from the sample, one at a time until all stu- 
dent data had been included. At each step, T, we recorded 
the values of our estimators using only the data that had 
been encountered up until then. This simulates a real world 
use-case, where additional students are added over time. We 
repeated this process 1000 times and averaged the results. 
Figure 2 shows the growth of unique states as students are 
added for the interaction networks generated by each prob- 
lem (line) in each of the five datasets. 
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Figure 3: The average absolute error between the es- 
timated number of new states and the observed new 
states over the number of students for all problems 
in each of the four datasets. P () accurately predicts 
the observed values after roughly 10 students, rarely 
being off by more than one after that. 

3.1 HI: Prediction of New States 

In order to evaluate Po for the prediction of new states 
(states that are frequency = 0 on time T), but will be fre- 
quency = 1 on Ti. |-i. At each T we add an additional student 
and compare the expected number of frequency 1 states, 
Es i, vs. the observed number, Osi- Across all five datasets, 
Figure 3 shows the differences between the expected and ob- 
served number of new states. The Po x Interactions predic- 
tion for new states follows closely with the observed number, 
the estimates increase in accuracy rapidly over the first ten 
students and are rarely off by more than a fraction of a state 
afterwards. Figure 4 shows the results of running this pro- 
cess on only the hintable portion of the interaction network 
for each data set. 

3.2 H2: Network Coverage 

We have defined network coverage Ic as the proportion of 
interactions which lie within the previously observed net- 
work. Another interpretation is that Ic is the probability of 
an interaction resulting in a state that has been previously 
observed. This value is the complement of Po- Figure 5 and 
7 display the results of network coverage and its growth as 
additional students are added. 

3.3 H3: Predicting Future Network Size 

In order to further evaluate the use of Po and Ic we cal- 
culated a prediction for the final size of the network, given 
the number of students in each dataset, at each time stamp. 
The equation for this prediction is: 

\V(IN)\ = (New Sample * P 0 ) + U T ■ (3) 

Where \V(IN)\ is the number of unique vertices (states) in 
the final network, NewSample is the number of new interac- 
tions added, Po is the estimation of new states added, and 
Ut is the number of unique states observed at time T. The 
results are averaged across all problems for each dataset and 



Figure 4: For the hintable states, the average differ- 
ence between the estimated number of new states 
and the observed new states over the number of stu- 
dents for all problems in each of the four datasets. 
Po accurately predicts the observed values after 
roughly 10 students, rarely being off by more than 
one after that. 


are presented in figures 8 and 9. This prediction rapidly im- 
proves and after roughly 20% of the sample is added, can 
accurately predict the final number of unique states for the 
network. This combined with the accuracy of Po reveals the 
short term and long term accuracy for the estimator. 


3.4 H4: Comparing State Matching Functions 

The network coverage metric, Ic, allows an easy method of 
estimating the differences in state matching functions and 
student network overlap. We can use Ic with two potential 
matching functions, and get an estimate of the remaining 
network, to quickly compare different potential state repre- 
sentations as well as to find a state generalization that will 
allow for a desired amount of network coverage. 

The estimate based on the above methods has proven useful 
for comparing State Matching functions to help determine 
which produces more relevant hints. Figure 6 shows the 
BOTS interface, with the user’s program (codestate) and 
the game world (worldstate) both illustrated. In previous 
work investigating the Cold Start problem on the BOTS 
data set, we measured ’’coverage” in terms of how much of 
the newly added test data was already present in the training 
set [9, 14], Compare this analysis to Figure 5 which shows 
the estimated probability that a student’s next action will 
result in an observed state, Ic- After 100 students, the prob- 
ability that a student will generate a new codestate is still 
quite high, Po > .25. In comparison, after the same num- 
ber of students, the probability of generating a new world- 
state is extremely low, Po < .02. This result supports both 
our intuition and our results from the previous work, that 
students will continue to generate new codestates, but that 
these different codestates will collapse to previously observed 
worldstates. 
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Figure 5: The estimated network coverage Ic for 
each of the 5 datasets, note the poor coverage for 
the BOTS-C dataset. The BOTS-W state is more 
general and has the much higher coverage. 


Figure 7: For the hintable network: the estimated 
network coverage Ic for each of the 5 datasets. Even 
the lowest performing hint network BOTS-C reaches 
roughly 70% coverage by 100 students. 



Figure 6: An image of the main gameplay interface 
for BOTS. The left hand side of the screen shows 
the user’s program, used to derive code states. The 
right-hand side shows the game world, where the 
program output determines the world states. 


3.5 H5: Comparing Populations 

Samples from different populations have different resulting 
interaction networks. The size of the represented network 
can tell us about the similarity of student approaches in the 
sample. If students are more alike in the types of actions 
they perform, fewer students will be needed to achieve a 
similar amount of overlap. We can also see that adding stu- 
dents from a dissimilar population will not always increase 
estimated network coverage (7c), and can potentially de- 
crease it. This has implications about the importance of 
building hints for one population and applying it for an- 
other. In other work we have already shown that different 
groups are likely to visit different parts of the networks [4] . 
Here we expand on that analysis by showing that the two 


Table 2: Different populations have different spread 
in problem exploration. 


Group 

Po 

States 

Interactions 

Fi 

Hint 

0.09 

514.61 

2709.84 

250.09 

Control 

0.10 

720.12 

3904.92 

340.00 


groups, while having the same amount of network coverage, 
have a different number of unique states. Table 2 shows the 
results between the Hint group, which received hints on a 
subset of the problems, and the Control group which never 
received hints. This corresponds with results from Eagle et 
al. [4] in which they uncovered significant differences in the 
student overall approaches. This result adds to that an es- 
timation of how complete each network was, revealing that 
additional data was not likely to change the result. It also 
shows some evidence for a trail blazing effect. When pro- 
vided hints, students collectively explore a smaller area of 
the state space. 


3.6 Estimating the effect of filtering 

Visualizations must struggle with an ’’information to ink” 
ratio. There is a trade-off between displaying full informa- 
tion and overwhelming the viewer, and displaying only the 
most frequent states and potentially misleading the viewer 
by eliminating information. InVis, a visualization tool for 
exploring Interaction Networks allowed users to filter by fre- 
quency [11]. We can use the Good-Turing Estimation to cal- 
culate the amount of information removed by filtering fre- 
quency of a certain degree. Po is the proportion of the net- 
work missing, Ic >r = Ic~Pi r+Po, where r is a threshold 

value for removing low frequency states, and Pi r is the 

sum of Pi through P r . This should be a useful metric for 
visualizations for measuring the amount of network that is 
hidden by filtering. It is also useful to show that sometimes 
a large number of graphical elements can be removed, with 
only a small amount of interaction information lost. 
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Figure 8: Prediction of total final number of states, 
as observed number of states increases. Note that 
for small t, the estimate is very high (up to 300% 
over prediction), but becomes fairly accurate after 
roughly 20% of the sample is measured. 


4. DISCUSSION 

Good- Turing Estimation works well in the contexts of in- 
teraction networks. We were able to provide an easily cal- 
culable estimate of the proportion of the network not yet 
observed Po. This value alone is a useful high level metric 
for the percentage of times a student interaction results in a 
previously unobserved state. The Po score for the hintable 
network is likewise an estimate of the probability that a stu- 
dent will “fall off” of the network from which we can provide 
feedback. Our network coverage metric Ic allows a quick 
and easy to calculate method of comparing different state 
representations, as well as quantifying the difference. We 
believe that this metric can replace the commonly used cold 
start method of evaluating the “hintability” of a network. Ic 
is also valuable to quickly gauge the applicability of a new 
domain to interaction networks. The majority of the cal- 
culations can be performed on the transactional data. The 
growth trends for our five datasets were often clear after 
only ten students. 

Our network estimators also have implications given our pre- 
vious theories on the network being a sample created from 
biased (non-random) walks on the problem-space, as the 
more homogeneous the biased walkers are, the faster the net- 
work will represent the population and the fewer additional 
states will be explored. We revisited our previous results [4] , 
and found that students with access to hints explored less 
overall unique states. This implies that the students were 
more similar to each other in terms of the types of actions 
and states they visited within the problem. Overall, this re- 
sult supports the idea that different populations of students 
will have different interaction networks. The implications of 
this for generating hints are great. Building hints on one 
population might not work as well in another, and adding 
interventions or hints can dramatically reduce the number of 
states visited by the students. Future work should explore 
the possibility of having multiple network representations 



Figure 9: Prediction of total final number of goal 
states, as observed number of states increases. Note 
that for small t, the estimate is very high, but be- 
comes an underestimate as t increases. P 0 can pre- 
dict the number of additional hintable states that 
can be added for a additional sample of data. 


and choosing to match the student with the one closely re- 
sembling them. 

As you can see in figure 8, our estimator starts out drasti- 
cally overestimating the number of unobserved states in the 
network. As we collect data, this eventually becomes a slight 
underestimate, eventually converging on the correct number 
of states. One explanation for why this might be the case is 
the method by which undiscovered states are added to the 
network. By using this model for our estimator, we are mak- 
ing an assumption that states are selected independently of 
one another. At the beginning, when data is sparse, this 
assumption is not particularly harmful, since undiscovered 
states are relatively common. However, as our dataset be- 
comes richer, we underestimate the probability of adding 
an unobserved state because we do not take into account 
the effect of “trail-blazing” which increases the probability 
of adding additional unobserved states after the first. Eagle 
and Barnes found that interaction networks had properties 
of scale-free networks. [5]. In particular, their degree distri- 
butions follow a power law, with a few vertices having much 
higher degree than the average for the network. It is likely 
that taking into account the scale-free and hierarchical na- 
ture of the networks will provide methods to improve on our 
estimators. 

5. CONCLUSIONS AND FUTURE WORK 

We have adapted Good- Turing frequency estimation for use 
with networks built from student-tutor interactions. We 
found that the estimator for the missing proportion of the 
network Po was accurate in predicting the number of new 
states discovered with new data. We also found that we 
could accurately measure network coverage with Ic for both 
the regular network, as well as the network of hintable states. 
This provides us with a metric to compare different state 
representations as well as determine the suitability of inter- 
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action network methods to different tutoring environments. 
We were also able to use these metrics to provide accurate 
predictions for the size of networks expected given more 
data samples, which will be useful for predicting the amount 
of additional data needed to provide a desired amount of 
hintable network coverage. Finally, we used the estimate of 
network coverage to compare different student populations 
to show that the addition of hints in one environment had 
an effect on the number of states explored by students. 

Future work will include expanding on these global measures 
of the network and exploring local measures of coverage. 
Rather than compute coverage for the entire network we 
can use methods such as approach map regioning [4] to find 
meaningful sub-networks and calculate the metrics for those. 
The region level values of Po can estimate the “riskyness” of 
certain approaches to the problem. The 7c metric can direct 
attention to parts of the network that are not well explored, 
perhaps allowing additional hints to be obtained by starting 
advanced users in those areas. 
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